Measuring Semantic Similarity in Short Texts through Greedy Pairing and Word Semantics

نویسندگان

  • Mihai C. Lintean
  • Vasile Rus
چکیده

We propose in this paper a greedy method to the problem of measuring semantic similarity between short texts. Our method is based on the principle of compositionality which states that the overall meaning of a sentence can be captured by summing up the meaning of its parts, i.e. the meanings of words in our case. Based on this principle, we extend wordto-word semantic similarity metrics to quantify the semantic similarity at sentence level. We report results using several word-to-word semantic similarity metrics, based on word knowledge or vectorial representations of meaning. Our approach performs better than similar approaches on the tasks of paraphrase identification and recognizing textual entailment, which are two illustrative semantic similarity tasks. We also report the role of word weighting and of function words on the performance of the proposed method.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Conceptual changes of Mihrab, emphasizing on third and fourth century AH sources

Mihrab existed before Islam. This word became one of the main components of Islamic mosques after the Islamic conquest. Structure and function changes of Mihrab during these two periods could be considered in various methods. Considering conceptual changes is one of the historical studies methods. This article aims to investigate a part of conceptual changes that reflects Mihrab’s structural an...

متن کامل

The SIMILAR Corpus: A Resource To Foster The Qualitative Understanding of Semantic Similarity of Texts

We describe in this paper the SIMILAR corpus which was developed to foster a deeper and qualitative understanding of word-to-word semantic similarity metrics and their role on the more general problem of text-to-text semantic similarity. The SIMILAR corpus fills a gap in existing resources that are meant to support the development of text-to-text similarity methods based on word-level similarit...

متن کامل

The Semantics of the Word Istikbar (Arrogance) in the Holy Quran based on Syntagmatic Relations(A Case Study of Semantic Proximity and Semantic Contrast)

The word istikbar (arrogance) is one of the key words in the monotheistic system of the Quran, which has found a special status as a special feature of the opponents and adversaries of the call to the truth. Given the prominent role of this issue in the human life system and its provision of corruption and moral deviations, it is necessary to represent the nature of the elements that make up th...

متن کامل

Semantic Similarity of Short Texts

This paper presents a method for measuring the semantic similarity of texts using a corpus based measure of semantic word similarity and a normalized and modified versions of the Longest Common Subsequence (LCS) string matching algorithm. Existing methods for computing text similarity have focused mainly on either large documents or individual words. In this paper, we focus on computing the sim...

متن کامل

A Method for Measuring Sentence Similarity and iIts Application to Conversational Agents

This paper presents a novel algorithm for computing similarity between very short texts of sentence length. It will introduce a method that takes account of not only semantic information but also word order information implied in the sentences. Firstly, semantic similarity between two sentences is derived from information from a structured lexical database and from corpus statistics. Secondly, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012